Regression_Step 05 Modeling¶

Introduction¶

Going back to step 01, let's review our goals.

Objectives This project focuses on quantitative data science and mainly aims to perform two goals. One is to predict housing price based on multiple variables such as room size or building types etc. The other is to forecast sales of a real estate company.

These goals is the time series forecast part and is comlpeted in the differet file. Here, we focus on teh former goal, which is to make a financial model for a housnig price.

Using this model, we analyze some housing price transaction in 2018. Again, we are assuming we were in Jan, 2018.

Imports¶

import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns

from library.sb_utils import save_file

# For normalization
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# For metrics
from sklearn.metrics import mean_squared_error,mean_absolute_error,explained_variance_score

%matplotlib inline

%pdb 0

Automatic pdb calling has been turned OFF

Set parameteres¶

# Set the number of display
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_colwidth', -1)  # or 199

pd.options.display.float_format = '{:.4f}'.format

def load_CVmodel(model_path,expected_model_version = '1.0'):
    import pickle
    from sklearn import __version__ as sklearn_version
    
    if os.path.exists(model_path):
        with open(model_path, 'rb') as f:
            model = pickle.load(f)
            print("Expected model version has been loaded")
        if model.version != expected_model_version:
            print("Expected model version doesn't match version loaded")
        if model.sklearn_version != sklearn_version:
            print("Warning: model created under different sklearn version")
        return model

    else:
        print("Expected model not found. Run the model fit")

# https://github.com/jbmouret/matplotlib_for_papers/blob/master/src/plot_variance_matplotlib_white.py
def clean_layout(ax):
    # Delete unnecessary lines
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)

    try:
        # Remove frame color
        frame = ax.legend_.get_frame()
        frame.set_facecolor('1.0')
        frame.set_edgecolor('1.0')
    except:
        pass

    # y axis grid line
    ax.grid(axis='y', color="0.9", linestyle='-', linewidth=1)
    ax.set_axisbelow(True)

Load Data ¶

df0 = pd.read_csv('../data/df_data_step3b_EDA.csv')

df_sales = pd.read_csv('../data/time_sales_data_step3b_EDA.csv',index_col=None)

ModelComparison = pd.read_csv('../data/ModelComparison_data_step4_EDA.csv',index_col=None)

# This is necessary for making pipeline
dummy_step04 = pd.read_csv('../data/dummies_in_step4_EDA.csv',index_col=None)

df0.head()

The financial models to use¶

# check the best stack model and the models that used in the stack model
ModelComparison.iloc[[7,9,10]]

The model we use is Stacking ensemble, which is based on the other two models.

Add some additional transformer for a pipeline¶

Imoprt necessary libraries¶

from sklearn.pipeline import Pipeline
from sklearn.base import BaseEstimator

Data to use¶

As we are assuming we were in Jan, 2016, the data is set to only 2017.

Drop unnecessary columns¶

# Drop the columns unnecessary
class DropColumns(BaseEstimator):
    
    def __init__(self):
        pass
    
    def fit(self, X, y=None):
        return self
    
    def transform(self, X: pd.DataFrame, y=None):
        X = X.drop(['url','id','Cid','tradeTime', 'totalPrice'],axis=1) # The model was only trained on Metropolitan    
        return X

Dummy columns¶

The process is the same as step04

dummy_step04.columns

Index(['buildingType_2', 'buildingType_3', 'buildingType_4', 'buildingType_5',
       'elevator_1', 'renovationCondition_2', 'renovationCondition_3',
       'renovationCondition_4', 'buildingStructure_2', 'buildingStructure_3',
       'buildingStructure_4', 'buildingStructure_5', 'buildingStructure_6',
       'fiveYearsProperty_1', 'subway_1', 'district_10', 'district_11',
       'district_12', 'district_13', 'district_2', 'district_3', 'district_4',
       'district_5', 'district_6', 'district_7', 'district_8', 'district_9',
       'floorType_1', 'floorType_2', 'floorType_3', 'floorType_4',
       'floorType_5', 'DOM_cat_1', 'DOM_cat_2'],
      dtype='object')

class DummyColumns(BaseEstimator):
    
    def _init__(self):
        pass
    
    def fit(self,X, y=None):
        return self
    
    def transform(self, X, y=None):
        
        # Set feature values to make dummies for
        ls_label = ['buildingType','elevator','renovationCondition' , 'buildingStructure', 'fiveYearsProperty', 'subway', 'district', 'floorType', 'DOM_cat']
        dummies_columns = dummy_step04.columns
        
        dummies = pd.get_dummies(X[ls_label].astype(str),
#                                  drop_first=True
                                )
        dummies = dummies.reindex(columns = dummies_columns, fill_value=0)
        res = pd.concat([X, dummies], axis=1)
        res = res.drop(ls_label, axis=1)
        
        return res

Build a Pipeline¶

Import the best model in step04¶

model_path = '../models/stack_opt.pkl'
stack = load_CVmodel(model_path)

Expected model version has been loaded

Define the pipeline¶

from sklearn.pipeline import make_pipeline,Pipeline

# structure_pipeline = Pipeline([
#                 ("DropColumns", DropColumns()),
#                 ("DummyColumns", DummyColumns()),
#                 ])

financialModel_1_pipe = Pipeline([
                ("DropColumns", DropColumns()),
                ("DummyColumns", DummyColumns()),
                ("stack",stack),
                ])

Make a prediction¶

def make_prediction(df,model_pipeline, model_name:str):
    
    # Create train and test data set
    # This is for modeling, which is the data in 2018
    df_train = df.drop(['price'],axis=1)
    df_test = df[['price']]
    
#     data = structure_pipeline.transform(df_train)   # Save the transformation to a separate df
    predictions = model_pipeline.predict(df_train).reshape(-1,1)   # Make the prediction on the original df
    
    # Append the prediction to the separate transformed data
#     data[f'preidcitons_price'] = y_scaler.inverse_transform(predictions)
    
    data = df.copy()
    data['preidcitons_price'] = predictions
    data.rename(columns={'preidcitons_price':f'modelPrice_{model_name}'}, inplace=True)
#     data['price'] = df_test
    return data

price_prediction_stack = make_prediction(df0[df0.year==2018],financialModel_1_pipe, 'stack')

price_prediction_stack.tail()

# Difference between price and model price
def calModelGap(df):
    df['gap'] = df['price']-df['modelPrice_stack']
    
calModelGap(price_prediction_stack)
sns.histplot(price_prediction_stack['gap'],kde=True)
sns.despine(left=True);

This tells price in actual transaction is more inexpensive than listed price.

Let's check the the worst 5.

price_prediction_stack.sort_values(by='gap',ascending=True,)[['price','modelPrice_stack','gap']].head()

Problem 1: Houses with too low price¶

These houses will be found by making use of standard deviation.

sigma = price_prediction_stack['gap'].std()
sigma

6170.659098860616

price_prediction_stack['priceTooLow'] = price_prediction_stack['gap'].apply(lambda x: 1 if x<-2*sigma  else 0)

tooLow = price_prediction_stack[price_prediction_stack.priceTooLow==1]

tooLow.head()

tooLow_new = tooLow.iloc[:,:31].copy()

tooLow_new.renovationCondition = int(4)

tooLow_new.head()

tooLow_new_reno4 = make_prediction(tooLow_new,financialModel_1_pipe, 'stack')

tooLow_new_reno4[['renovationCondition','modelPrice_stack']]
len(tooLow_new_reno4[['renovationCondition','modelPrice_stack']])

21

tooLow['reno_4'] = tooLow_new_reno4['modelPrice_stack']

table1 = tooLow[['renovationCondition','modelPrice_stack','reno_4']]

# Expected increase in total sales
tooLow_new_reno4['modelPrice_stack'].sum() - tooLow_new_reno4['price'].sum()

383436.85648112115

# calculate the gap between before-price and after-price
table1['gap'] = table1['reno_4']-table1['modelPrice_stack']

# filter out the houses which prices can be improved by hard floor renovation
temp = table1[table1['gap'] > table1['gap'].std()]

# rename the table
tempp = temp.rename(columns={
                    'renovationCondition':'Condition before renovation',
                    'modelPrice_stack':'price before',
                    'reno_4':'price after',
                    
                    
                    
                    })

tempp['percent increase'] = tempp['gap']/tempp['price before']*100

# 
tempp

# 
tempp.describe()

Q1: How does renovation contribute to increase in unit price ('price')?¶

Check how gap distributes against renovationCategory¶

sns.boxplot(x="renovationCondition", y="gap", data=price_prediction_stack)
sns.stripplot(x="renovationCondition", y="gap", data=price_prediction_stack, marker="o", alpha=0.3, color="black", order=[1,2,3,4]);
sns.despine(left=True);

Definition of reonvation condition category

renovationCondition == 1 - "Other"
renovationCondition == 2 ~ "Rough"
renovationCondition == 3 ~ "Simplicity"
renovationCondition == 4 ~ "Hardcover"

sns.boxplot(x="renovationCondition", y="modelPrice_stack", data=price_prediction_stack,)
sns.stripplot(x="renovationCondition", y="modelPrice_stack", data=price_prediction_stack, marker="o", alpha=0.3, color="black", order=[1,2,3,4]);
sns.despine(left=True);

Q2 How different are Hardcover renovationa (cat4) and Simplicity renovation (cat3)?¶

According to this, there seem to be no difference between Hardcover renovationa (cat4) and Simplicity renovation (cat3). If this is true, there is no merit to spend extra money for renovation.

Next, we statistically test this.

# Create a column called `Permutation1`, and assign to it the result of permuting (shuffling) the price column
price_prediction_stack['Permutation1'] = np.random.permutation(price_prediction_stack.price)

# Call the describe() method on our permutation grouped by 'platform'. 
price_prediction_stack.groupby('renovationCondition')[['Permutation1']].describe()


price_prediction_stack.groupby('renovationCondition')[['modelPrice_stack','Permutation1']].describe().T

price_prediction_stack.groupby('renovationCondition')[['Permutation1']].mean()

renoCat1 = price_prediction_stack[price_prediction_stack.renovationCondition==1].modelPrice_stack
renoCat2 = price_prediction_stack[price_prediction_stack.renovationCondition==2].modelPrice_stack
renoCat3 = price_prediction_stack[price_prediction_stack.renovationCondition==3].modelPrice_stack
renoCat4 = price_prediction_stack[price_prediction_stack.renovationCondition==4].modelPrice_stack

Check the distribution of the data¶

from scipy import stats

renoCat3_normal = stats.normaltest(renoCat3)
renoCat3_normal

NormaltestResult(statistic=7.441407966140893, pvalue=0.024216913546271368)

renoCat4_normal = stats.normaltest(renoCat4)
renoCat4_normal

NormaltestResult(statistic=21.167788072776986, pvalue=2.532055512935648e-05)

Both are not normal.

Hypothesis formulation¶

Our Null hypothesis is just:

H null: the observed difference in the mean Hardcover renovationa (cat4) and mean Simplicity renovation (cat3) is due to chance (and thus not due to the renovation).

H alternative: the observed difference is not due to chance (and is actually due to renovation)

We're also going to pick a significance level of 0.05.

Permutation test¶

Is there wny difference among categories? Here, we perform permutaiton test. The data to use is modelPrice_stack predicted by the financial model.

sns.histplot(renoCat3,kde=True,color='r')
sns.histplot(renoCat4,kde=True,color='b')
sns.despine(left=True);

# Another way per the Datacamp in sec.11.3

np.random.seed(42)
def permutation_sample(data1, data2):
    """Generate a permutation sample from two data sets."""

    # Concatenate the data sets: data
    data = np.concatenate((data1, data2))

    # Permute the concatenated array: permuted_data
    permuted_data = np.random.permutation(data)

    # Split the permuted array into two: perm_sample_1, perm_sample_2
    perm_sample_1 = permuted_data[:len(data1)]
    perm_sample_2 = permuted_data[len(data1):]

    return perm_sample_1, perm_sample_2

def diff_of_means(data_1, data_2):
    """Difference in means of two arrays."""

    # The difference of means of data_1, data_2: diff
    diff = np.mean(data_1)-np.mean(data_2)

    return diff

def draw_perm_reps(data_1, data_2, func, size=1):
    """Generate multiple permutation replicates."""

    # Initialize array of replicates: perm_replicates
    perm_replicates = np.empty(size)

    for i in range(size):
        # Generate permutation sample
        perm_sample_1, perm_sample_2 = permutation_sample(data_1, data_2)

        # Compute the test statistic
        perm_replicates[i] = func(perm_sample_1, perm_sample_2)

    return perm_replicates

def permutationTest(a,b):
    # Compute difference of mean impact force from experiment: empirical_diff_means
    empirical_diff_means = diff_of_means(a,b)

    # Draw 10,000 permutation replicates: perm_replicates
    perm_replicates = draw_perm_reps(a,b,diff_of_means, size=10000)

    # Compute p-value: p
    # two-tailed
    p = np.sum(perm_replicates >= empirical_diff_means)/ len(perm_replicates)

    # Print the result
    print('empirical_diff_means ',empirical_diff_means)
    print('p-value =', p)
    
    
    plt.hist(perm_replicates)
    plt.axvline(empirical_diff_means, color='r')

permutationTest(renoCat3,renoCat4)
sns.despine(left=True);

empirical_diff_means  2138.0667016161024
p-value = 0.2631

There is no difference between 3 and 4 renovationCondition. Based on this result, we cannot deny the null hypoothesis that the distributions are same. The benefit of renovation is not significant in terms of price appreciation.

T-test for the means of two independent samples¶

import random
from scipy.stats import ttest_ind

ttest,pval = ttest_ind(renoCat3,renoCat4, equal_var = False)
print("ttest",ttest)
print('p value',format(pval, '.70f'))

if pval <0.05:
    print("we reject null hypothesis")
else:
    print("we accept null hypothesis")

ttest 0.6460900335843099
p value 0.5191130687439893254264688948751427233219146728515625000000000000000000
we accept null hypothesis

Problem 2: How to increase price by 15% by any additional changes within the next 3 months?¶

One of the methods to increase a housing price is renovation. How much appreciation in price will be expected if Hardcover renovation (cat4) are applied to all houses? Typical renovation is done within one month, and only concern here is to check if the improvement in price exceeds by 15% or not.

# Make a dataframe for this question
dfQ2 = df0[df0.year==2018]

# Change therenovationCondition to 4
dfQ2.renovationCondition = 4
dfQ2.head()

# Predict new prices
Q2 = make_prediction(dfQ2,financialModel_1_pipe, 'stack')
Q2.rename(columns={'modelPrice_stack':'price_renovation3to4'}, inplace=True)
price_prediction_stack = pd.concat([price_prediction_stack,Q2.iloc[:,-1:]],axis=1)

price_prediction_stack['renovTo4_pct'] =\
(price_prediction_stack.price_renovation3to4-price_prediction_stack.modelPrice_stack)/price_prediction_stack.modelPrice_stack*100

price_prediction_stack.head()

f =(price_prediction_stack.renovationCondition==1)|(price_prediction_stack.renovationCondition==2)|(price_prediction_stack.renovationCondition==3)

price_prediction_stack[f][['renovTo4_pct']].describe().T

The expected appreciation is as following.

Median: 2.0%
Mean: 2.0%
Std: 1.9%

Based on these, standard deviation is same level as its mean. According to this model, the average improvement in price by renovation is 2.0%. Therefore, if renovation cost is less than sales price times this 2.0%, the renovation may be worth an invest, but it si difficut to say that because the standard deviation is as high as 2.0.

Q4 Which house should be renovated?¶

Let's find the houses that should be renovated with teo sigma confidence.

sigma = price_prediction_stack[f].renovTo4_pct.std()
sigma

1.9166712855484855

# filter out the only price_renovation3to4 that have significant level of positive percentage increase (price_renovation3to4) by more than two sigma.
price_prediction_stack[price_prediction_stack.renovTo4_pct>sigma*2]

Those houses are financial-modelically worthwhile investment of renovation.

Q5 Which features can have significant benefit by renovation?¶

# Add a new column of boolean if renovTo4_pct is positive or not. 

def posiNega(x):
    if x > 0:
        return 1
    else:
        return 0

price_prediction_stack['renoPlus'] = price_prediction_stack['renovTo4_pct'].apply(posiNega)
price_prediction_stack.head()

features = ['followers',
       'totalPrice', 'square', 'livingRoom', 'drawingRoom', 'kitchen',
       'bathRoom', 'buildingType', 'constructionTime', 
       'buildingStructure', 'ladderRatio', 'elevator', 'fiveYearsProperty',
       'subway', 'district', 'year', 'month', 'floorType', 'floorHeight',
       'DOM_cat', '10yrBond_up%', 'hmi_up%', 'sse_up%', 'modelPrice_stack',
       'renoPlus']
# sns.pairplot(price_prediction_stack[features], hue="renoPlus");

Using pairplot, it is found that the columns, 'livingRoom', 'buildingType', 'followers', had more difference between renoPlus boolean than the others.

Next, check the histgrams of these columns to find which houses are better to renovate.

f, ax = plt.subplots(1, 3, figsize=(12, 6), sharex=False)

for i,_ in enumerate(['livingRoom', 'buildingType', 'followers']):
    sns.histplot(data=price_prediction_stack, x=_, multiple="dodge", hue="renoPlus",kde=True,ax=ax[i])
    sns.despine(left=True)

a = price_prediction_stack[price_prediction_stack.renoPlus==0].livingRoom
b = price_prediction_stack[price_prediction_stack.renoPlus==1].livingRoom
permutationTest(a,b)
sns.despine(left=True);

empirical_diff_means  0.05600000000000005
p-value = 0.319

a = price_prediction_stack[price_prediction_stack.renoPlus==0].buildingType
b = price_prediction_stack[price_prediction_stack.renoPlus==1].buildingType
permutationTest(a,b)
sns.despine(left=True);

empirical_diff_means  -0.0129999999999999
p-value = 0.5526

a = price_prediction_stack[price_prediction_stack.renoPlus==0].followers
b = price_prediction_stack[price_prediction_stack.renoPlus==1].followers
permutationTest(a,b)
sns.despine(left=True);

empirical_diff_means  20.08166666666667
p-value = 0.0072

We conclude that renovation will bring significant difference in followers.

Q6 How different are the prices between Listed price(price) vs Predicted price(modelPrice) ?¶

price_prediction_stack[['gap']].describe().T

sig = np.std(price_prediction_stack['gap'])

# price - modelPrice_stack is too low(-)
len(price_prediction_stack[price_prediction_stack.gap<-2*sig].sort_values(by='gap'))

21

# price - modelPrice_stack is too high(+)
# houses which listed prices are much lower thatn predicted model
price_prediction_stack[price_prediction_stack.gap>1.9*sig].sort_values(by='gap',ascending = False).head(10)

Those may be worthwhile to review and revise those prices.

Summary¶

There are not so many feature values that real estate company can change regarding their product. One of them is renovation, and we have checked how the renovation will contribute to the increase in house price. Our conclusion is that it will unfortunately not bring signicant diffference in the appreciation in housing price, it can be said that it may be brive to cut off renovatio cost as reasonably as possible.

In addition to the renovation factor, we also filter out the houses which prices had better to be reconsidered.

	url	id	Lng	Lat	Cid	tradeTime	totalPrice	price	square	livingRoom	drawingRoom	kitchen	bathRoom	buildingType	constructionTime	renovationCondition	buildingStructure	ladderRatio	elevator	fiveYearsProperty	subway	district	year	month	floorType	floorHeight	DOM_cat	10yrBond_up%	hmi_up%	sse_up%
0	https://bj.lianjia.com/chengjiao/BJHD84136716.html	BJHD84136716	116.2843	39.9427	1111027378362	2012-01-01	618.0000	38868	159.0000	3	2	2	1	4	2001	1	2	0.5000	1	0	1	8	2012	1	0	6	N	-0.0423	-0.0212	0.0424
1	https://bj.lianjia.com/chengjiao/BJCY84233046.html	BJCY84233046	116.5054	39.9391	1111027379229	2012-01-01	95.0000	20145	47.1600	1	1	1	1	4	1995	1	2	0.5000	0	0	0	7	2012	1	3	6	N	-0.0423	-0.0212	0.0424
2	https://bj.lianjia.com/chengjiao/BJCY84129495.html	BJCY84129495	116.4551	39.9315	1111027376792	2012-01-01	223.0000	28916	77.1200	3	1	1	1	4	1988	1	4	0.3330	0	1	1	7	2012	1	5	6	N	-0.0423	-0.0212	0.0424
3	https://bj.lianjia.com/chengjiao/BJCY84131603.html	BJCY84131603	116.4595	39.9540	1111027381384	2012-01-01	164.4000	21885	75.1200	3	1	1	1	3	1984	1	6	0.2500	1	1	1	7	2012	1	3	16	N	-0.0423	-0.0212	0.0424
4	https://bj.lianjia.com/chengjiao/BJHD56417864.html	BJHD56417864	116.3815	39.9778	1111027381832	2012-01-01	107.0000	23363	45.8000	1	1	1	1	3	1996	1	6	0.0710	1	1	0	8	2012	1	5	9	N	-0.0423	-0.0212	0.0424

	Model name	MAE	RMSE	Explained_variance_score	Best para
7	RandomForestRegressor w/ BayesSearchCV	4558.5951	7073.6714	0.9149	OrderedDict([('rf__max_depth', 100), ('rf__min_samples_leaf', 1), ('rf__min_samples_split', 2), ('rf__n_estimators', 600)])
9	GradientBoostingRegressor w/ BayesSearchCV	4196.3939	6513.0618	0.9278	OrderedDict([('gb__learning_rate', 0.11413168586297671), ('gb__max_depth', 8), ('gb__n_estimators', 1000)])
10	Stacking ensemble	4111.6664	6431.3238	0.9296	NaN

	url	id	Lng	Lat	Cid	tradeTime	followers	totalPrice	price	square	livingRoom	drawingRoom	kitchen	bathRoom	buildingType	constructionTime	renovationCondition	buildingStructure	ladderRatio	elevator	fiveYearsProperty	subway	district	year	month	floorType	floorHeight	DOM_cat	10yrBond_up%	hmi_up%	sse_up%	modelPrice_stack
312626	https://bj.lianjia.com/chengjiao/101101989805.html	101101989805	116.5854	39.9119	1111027377521	2018-01-22	98	261.0000	37244	70.0800	2	1	1	1	1	2010	4	6	0.2000	1	1	1	7	2018	1	3	22	L	0.0074	-0.0064	0.0525	43307.7790
312627	https://bj.lianjia.com/chengjiao/101102201096.html	101102201096	116.2203	40.0436	1111027376008	2018-01-22	112	830.0000	51096	162.4400	4	2	1	2	4	2009	4	6	0.5000	1	1	0	8	2018	1	5	8	L	0.0074	-0.0064	0.0525	59094.0614
312628	https://bj.lianjia.com/chengjiao/101102219310.html	101102219310	116.3464	40.0014	1111027380544	2018-01-24	1	460.0000	76159	60.4000	2	1	1	1	4	1986	3	2	0.5000	0	1	1	8	2018	1	3	6	L	0.0074	-0.0064	0.0525	88311.6216
312629	https://bj.lianjia.com/chengjiao/101102448228.html	101102448228	116.4415	39.8721	1111027374184	2018-01-26	0	410.0000	63468	64.6000	2	1	1	1	1	1995	1	6	0.2500	1	0	1	2	2018	1	5	25	L	0.0074	-0.0064	0.0525	67889.6779
312630	https://bj.lianjia.com/chengjiao/101102321677.html	101102321677	116.4170	40.0068	1111027375671	2018-01-28	53	345.0000	71577	48.2000	1	1	1	1	4	1994	4	2	0.5000	0	0	1	7	2018	1	5	6	L	0.0074	-0.0064	0.0525	77901.4985

	price	modelPrice_stack	gap
312572	20128	58948.2200	-38820.2200
312518	84000	108788.5022	-24788.5022
312581	90266	114469.8891	-24203.8891
312499	102593	126578.3969	-23985.3969
312580	92575	112260.5957	-19685.5957

	url	id	Lng	Lat	Cid	tradeTime	followers	totalPrice	price	square	livingRoom	drawingRoom	kitchen	bathRoom	buildingType	constructionTime	renovationCondition	buildingStructure	ladderRatio	elevator	fiveYearsProperty	subway	district	year	month	floorType	floorHeight	DOM_cat	10yrBond_up%	hmi_up%	sse_up%	modelPrice_stack	gap	priceTooLow
312420	https://bj.lianjia.com/chengjiao/101102428419.html	101102428419	116.4172	39.9662	1111027380878	2018-01-01	30	620.0000	83311	74.4200	3	1	1	1	4	1976	3	2	0.5000	0	1	1	1	2018	1	5	5	M	0.0074	-0.0064	0.0525	98291.2766	-14980.2766	1
312422	https://bj.lianjia.com/chengjiao/101102212168.html	101102212168	116.3306	39.8920	1111027380537	2018-01-01	220	405.0000	70143	57.7400	2	1	1	1	4	1990	3	2	0.3330	0	0	1	10	2018	1	5	6	L	0.0074	-0.0064	0.0525	83152.8371	-13009.8371	1
312432	https://bj.lianjia.com/chengjiao/101101811228.html	101101811228	116.3523	40.0624	1111027382146	2018-01-01	58	660.0000	64328	102.6000	3	2	1	1	1	1996	3	6	0.2500	1	1	1	8	2018	1	5	22	L	0.0074	-0.0064	0.0525	78807.7570	-14479.7570	1
312433	https://bj.lianjia.com/chengjiao/101102449681.html	101102449681	116.2324	40.2355	1111027380837	2018-01-01	0	131.0000	39697	33.0000	1	1	1	1	4	1984	1	2	0.3330	0	0	0	6	2018	1	1	5	S	0.0074	-0.0064	0.0525	55479.8327	-15782.8327	1
312499	https://bj.lianjia.com/chengjiao/101102388385.html	101102388385	116.3586	39.9321	1111027376241	2018-01-02	49	415.5000	102593	40.5000	1	1	1	1	1	1992	3	2	0.2000	1	1	1	10	2018	1	5	20	L	0.0074	-0.0064	0.0525	126578.3969	-23985.3969	1

	Condition before renovation	price before	price after	gap	percent increase
312544	1	86310.5519	88018.6845	1708.1327	1.9791
312547	1	71396.3293	73315.1875	1918.8581	2.6876
312549	3	79108.8329	80763.9368	1655.1039	2.0922
312580	3	112260.5957	116142.8875	3882.2918	3.4583
312581	3	114469.8891	115835.9889	1366.0998	1.1934
312609	1	116670.0002	118708.5940	2038.5938	1.7473
312624	3	103665.3564	107134.1602	3468.8038	3.3462

	Condition before renovation	price before	price after	gap	percent increase
count	7.0000	7.0000	7.0000	7.0000	7.0000
mean	2.1429	97697.3651	99988.4913	2291.1263	2.3577
std	1.0690	18510.9003	18876.5085	976.4876	0.8406
min	1.0000	71396.3293	73315.1875	1366.0998	1.1934
25%	1.0000	82709.6924	84391.3106	1681.6183	1.8632
50%	3.0000	103665.3564	107134.1602	1918.8581	2.0922
75%	3.0000	113365.2424	115989.4382	2753.6988	3.0169
max	3.0000	116670.0002	118708.5940	3882.2918	3.4583

	renovationCondition	1	2	3	4
modelPrice_stack	count	27.0000	3.0000	75.0000	116.0000
	mean	62790.2013	85143.7201	66682.8790	64544.8123
	std	24143.8350	48543.2395	21695.3674	23287.8676
	min	31056.3548	52911.6294	30265.9049	30234.8589
	25%	44957.7321	57228.2974	49769.5435	47173.4739
	50%	55479.8327	61544.9653	64129.9297	60388.3475
	75%	78978.0442	101259.7654	78958.2949	75802.5472
	max	125663.0349	140974.5655	126578.3969	132175.4832
Permutation1	count	27.0000	3.0000	75.0000	116.0000
	mean	63112.5556	51662.0000	60146.0933	58879.3448
	std	24293.3946	17638.3102	19986.6952	22991.5126
	min	24771.0000	38704.0000	32773.0000	20128.0000
	25%	43335.5000	41618.5000	45209.0000	42405.0000
	50%	61218.0000	44533.0000	55917.0000	52781.5000
	75%	80808.0000	58141.0000	69002.0000	71589.7500
	max	122165.0000	71749.0000	142120.0000	124690.0000

	url	id	Lng	Lat	Cid	tradeTime	followers	totalPrice	price	square	livingRoom	drawingRoom	kitchen	bathRoom	buildingType	constructionTime	renovationCondition	buildingStructure	ladderRatio	elevator	fiveYearsProperty	subway	district	year	month	floorType	floorHeight	DOM_cat	10yrBond_up%	hmi_up%	sse_up%
312410	https://bj.lianjia.com/chengjiao/101102425915.html	101102425915	116.6553	40.1392	1111027381518	2018-01-01	3	223.0000	40022	55.7200	1	1	1	1	4	1989	4	2	0.5000	0	1	1	13	2018	1	0	6	M	0.0074	-0.0064	0.0525
312411	https://bj.lianjia.com/chengjiao/101102287189.html	101102287189	116.4358	39.7981	1111027379000	2018-01-01	106	395.0000	44533	88.7000	2	1	1	1	3	2010	4	6	0.2500	1	0	0	2	2018	1	1	10	L	0.0074	-0.0064	0.0525
312412	https://bj.lianjia.com/chengjiao/101102356574.html	101102356574	116.4776	39.9537	1111027382733	2018-01-01	15	332.0000	71032	46.7400	1	1	1	1	4	1996	4	2	0.5000	0	1	1	7	2018	1	3	6	L	0.0074	-0.0064	0.0525
312413	https://bj.lianjia.com/chengjiao/101102281973.html	101102281973	116.6846	39.9111	1111027381219	2018-01-01	48	350.0000	47164	74.2100	1	1	1	1	3	2003	4	6	0.5000	1	1	0	11	2018	1	5	12	L	0.0074	-0.0064	0.0525
312414	https://bj.lianjia.com/chengjiao/101102394663.html	101102394663	116.5551	39.9574	1111041166357	2018-01-01	20	407.0000	45531	89.3900	3	1	1	1	4	2011	4	6	0.2500	0	1	0	7	2018	1	5	21	L	0.0074	-0.0064	0.0525

	url	id	Lng	Lat	Cid	tradeTime	followers	totalPrice	price	square	livingRoom	drawingRoom	kitchen	bathRoom	buildingType	constructionTime	renovationCondition	buildingStructure	ladderRatio	elevator	fiveYearsProperty	subway	district	year	month	floorType	floorHeight	DOM_cat	10yrBond_up%	hmi_up%	sse_up%	modelPrice_stack	gap	Permutation1	price_renovation3to4	renovTo4_pct
312454	https://bj.lianjia.com/chengjiao/101101748885.html	101101748885	116.3534	39.8756	1111027378762	2018-01-01	10	646.0000	61937	104.3000	3	1	1	2	1	2001	3	6	0.2500	1	0	0	2	2018	1	5	30	L	0.0074	-0.0064	0.0525	65895.8819	-3958.8819	46910	68556.7300	4.0380
312462	https://bj.lianjia.com/chengjiao/101102304111.html	101102304111	116.4332	40.0087	1111027377294	2018-01-02	28	624.5000	94593	66.0200	1	2	1	1	4	2006	1	6	0.6670	1	1	1	7	2018	1	3	15	L	0.0074	-0.0064	0.0525	96324.9894	-1731.9894	71328	101760.7891	5.6432
312483	https://bj.lianjia.com/chengjiao/101102344435.html	101102344435	116.5739	39.9288	1111027376842	2018-01-02	27	407.0000	44849	90.7500	2	1	1	1	3	2006	3	6	0.6670	1	1	1	7	2018	1	3	18	L	0.0074	-0.0064	0.0525	50672.6315	-5823.6315	53982	53254.1890	5.0946
312512	https://bj.lianjia.com/chengjiao/101101498405.html	101101498405	116.5044	39.9294	1111027376016	2018-01-02	77	284.0000	55350	51.3100	1	1	1	1	4	1994	2	2	0.3330	0	1	1	7	2018	1	0	6	L	0.0074	-0.0064	0.0525	61544.9653	-6194.9653	71749	64495.1746	4.7936
312524	https://bj.lianjia.com/chengjiao/101102118625.html	101102118625	116.3229	40.0886	1111027379919	2018-01-02	39	430.0000	33180	129.6000	3	2	1	2	4	2008	3	2	0.5000	0	1	0	6	2018	1	5	6	L	0.0074	-0.0064	0.0525	38054.1988	-4874.1988	37314	39536.5144	3.8953
312526	https://bj.lianjia.com/chengjiao/101102070706.html	101102070706	116.4046	39.9785	1111027375686	2018-01-02	169	373.0000	65600	56.8600	2	1	1	1	4	1986	3	2	0.5000	0	0	1	7	2018	1	5	6	L	0.0074	-0.0064	0.0525	73257.1512	-7657.1512	121191	79093.3696	7.9668
312545	https://bj.lianjia.com/chengjiao/101102088408.html	101102088408	116.4689	39.8825	1111027378493	2018-01-03	134	325.5000	55804	58.3300	1	1	1	1	1	1989	1	6	0.2500	1	1	1	7	2018	1	5	17	L	0.0074	-0.0064	0.0525	57716.6167	-1912.6167	41822	60098.5980	4.1270
312555	https://bj.lianjia.com/chengjiao/101102437883.html	101102437883	116.3421	39.7497	1111027378968	2018-01-03	20	222.0000	39728	55.8800	2	1	1	1	4	1997	3	2	0.5000	0	0	1	4	2018	1	2	6	M	0.0074	-0.0064	0.0525	48897.9008	-9169.9008	55882	50837.9111	3.9675
312558	https://bj.lianjia.com/chengjiao/101102403623.html	101102403623	116.4249	39.9898	1111027378004	2018-01-03	31	1002.0000	54602	183.5100	4	1	1	2	1	1995	3	6	0.3330	1	1	1	7	2018	1	5	19	L	0.0074	-0.0064	0.0525	55707.9042	-1105.9042	62131	61541.2313	10.4713
312592	https://bj.lianjia.com/chengjiao/101102414473.html	101102414473	116.4205	39.9979	1111027375605	2018-01-03	18	585.0000	69710	83.9200	3	1	1	1	1	1991	3	6	0.2500	1	0	0	7	2018	1	3	25	L	0.0074	-0.0064	0.0525	76840.7105	-7130.7105	45974	80425.2783	4.6649
312601	https://bj.lianjia.com/chengjiao/101102217765.html	101102217765	116.3349	39.7440	1111027378789	2018-01-03	23	270.0000	25902	104.2400	2	1	1	1	3	2006	1	6	0.2730	1	1	1	4	2018	1	0	19	L	0.0074	-0.0064	0.0525	36958.1370	-11056.1370	92575	38730.1515	4.7947
312605	https://bj.lianjia.com/chengjiao/101102278923.html	101102278923	116.4575	39.8764	1111027375273	2018-01-03	21	479.0000	56923	84.1500	2	1	1	1	1	2009	3	6	0.2500	1	0	1	7	2018	1	0	28	L	0.0074	-0.0064	0.0525	45196.4565	11726.5435	92269	48964.5491	8.3371
312611	https://bj.lianjia.com/chengjiao/101102434018.html	101102434018	116.4231	39.6764	1111027381935	2018-01-03	0	219.0000	24771	88.4100	2	1	1	1	4	2012	1	6	0.3330	0	0	0	4	2018	1	5	6	M	0.0074	-0.0064	0.0525	31056.3548	-6285.3548	41689	32351.5915	4.1706
312623	https://bj.lianjia.com/chengjiao/101102443459.html	101102443459	116.6474	39.8963	1111027380847	2018-01-03	1	405.0000	42718	94.8100	2	2	1	2	4	2004	3	2	0.5000	0	1	0	11	2018	1	0	6	M	0.0074	-0.0064	0.0525	50234.4308	-7516.4308	65600	52215.9362	3.9445

	url	id	Lng	Lat	Cid	tradeTime	followers	totalPrice	price	square	livingRoom	drawingRoom	kitchen	bathRoom	buildingType	constructionTime	renovationCondition	buildingStructure	ladderRatio	elevator	fiveYearsProperty	subway	district	year	month	floorType	floorHeight	DOM_cat	10yrBond_up%	hmi_up%	sse_up%	modelPrice_stack	gap	Permutation1	price_renovation3to4	renovTo4_pct	renoPlus
312469	https://bj.lianjia.com/chengjiao/101102086891.html	101102086891	116.4621	39.9238	1111027380611	2018-01-02	27	2260.0000	124690	181.2500	3	1	1	2	3	2008	4	6	0.7500	1	1	1	7	2018	1	5	25	L	0.0074	-0.0064	0.0525	98385.0763	26304.9237	26184	98385.0763	0.0000	0
312576	https://bj.lianjia.com/chengjiao/101101814992.html	101101814992	116.5038	39.9307	1111027374762	2018-01-03	82	1460.0000	80896	180.4800	3	1	1	2	1	2011	4	6	0.1670	0	0	0	7	2018	1	5	32	L	0.0074	-0.0064	0.0525	68910.0095	11985.9905	36747	68910.0095	0.0000	0
312605	https://bj.lianjia.com/chengjiao/101102278923.html	101102278923	116.4575	39.8764	1111027375273	2018-01-03	21	479.0000	56923	84.1500	2	1	1	1	1	2009	3	6	0.2500	1	0	1	7	2018	1	0	28	L	0.0074	-0.0064	0.0525	45196.4565	11726.5435	92269	48964.5491	8.3371	1